99 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    An analysis of interactions within and between extreme right communities in social media

    Get PDF
    Many extreme right groups have had an online presence for some time through the use of dedicated websites. This has been accompanied by increased activity in social media websites in recent years, which may enable the dissemination of extreme right content to a wider audience. In this paper, we present exploratory analysis of the activity of a selection of such groups on Twitter, using network representations based on reciprocal follower and mentions interactions. We find that stable communities of related users are present within individual country networks, where these communities are usually associated with variants of extreme right ideology. Furthermore, we also identify the presence of international relationships between certain groups across geopolitical boundaries

    k-Nearest Neighbour Classifiers - A Tutorial

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.This paper is the second edition of a paper previously published as a technical report . Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods

    Underestimation Bias and Underfitting in Machine Learning

    Get PDF
    . Often, what is termed algorithmic bias in machine learning will be due to historic bias in the training data. But sometimes the bias may be introduced (or at least exacerbated) by the algorithm itself. The ways in which algorithms can actually accentuate bias has not received a lot of attention with researchers focusing directly on methods to eliminate bias - no matter the source. In this paper we report on initial research to understand the factors that contribute to bias in classification algorithms. We believe this is important because underestimation bias is inextricably tied to regularization, i.e. measures to address overfitting can accentuate bias

    A system for automatically annotating traditional Irish music field recordings

    Get PDF
    This paper presents MATT2 (Machine Annotation of Traditional Tunes). MATT2 is a novel system which can automatically annotate field recordings of traditional Irish music with useful metadata such as tune name, key signature, time signature, composer and discography. MATT2 works by using a number of algorithms to automatically transcribe digital audio to be annotated to the ABC music notation language. It then compares these transcriptions against a corpus of 860 human made transcriptions in ABC using a variation of the edit distance algorithm. Results using MATT2 to annotate fifty recordings of flute and fiddle tunes demonstrate a high success rate at annotating recordings made by different musicians. Additionally, several of the recordings successfully annotated in testing MATT2 were recorded in imperfect conditions, with badly degraded audio

    A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering

    Get PDF
    The problem of concept drift has recently received con- siderable attention in machine learning research. One important practical problem where concept drift needs to be addressed is spam filtering. The literature on con- cept drift shows that among the most promising ap- proaches are ensembles and a variety of techniques for ensemble construction has been proposed. In this pa- per we compare the ensemble approach to an alternative lazy learning approach to concept drift whereby a sin- gle case-based classifier for spam filtering keeps itself up-to-date through a case-base maintenance protocol. We present an evaluation that shows that the case-base maintenance approach is more effective than a selection of ensemble techniques. The evaluation is complicated by the overriding importance of False Positives (FPs) in spam filtering. The ensemble approaches can have very good performance on FPs because it is possible to bias an ensemble more strongly away from FPs than it is to bias the single classifer. However this comes at consid- erable cost to the overall accurac

    A Case-based Technique for Tracking Concept Drift in Spam Filtering

    Get PDF
    Clearly, machine learning techniques can play an important role in filtering spam email because ample training data is available to build a robust classifier. However, spam filtering is a particularly challenging task as the data distribution and concept being learned changes over time. This is a particularly awkward form of concept drift as the change is driven by spammers wishing to circumvent the spam filters. In this paper we show that lazy learning techniques are appropriate for such dynamically changing contexts. We present a case-based system for spam filtering called ECUE that can learn dynamically. We evaluate its performance as the case-base is updated with new cases. We also explore the benefit of periodically redoing the feature selection process to bring new features into play. Our evaluation shows that these two levels of model update are effective in tracking concept drift

    Down the (white) rabbit hole: the extreme right and online recommender systems

    Get PDF
    In addition to hosting user-generated video content, YouTube provides recommendation services, where sets of related and recommended videos are presented to users, based on factors such as co-visitation count and prior viewing history. This article is specifically concerned with extreme right (ER) video content, portions of which contravene hate laws and are thus illegal in certain countries, which are recommended by YouTube to some users. We develop a categorization of this content based on various schema found in a selection of academic literature on the ER, which is then used to demonstrate the political articulations of YouTube’s recommender system, particularly the narrowing of the range of content to which users are exposed and the potential impacts of this. For this purpose, we use two data sets of English and German language ER YouTube channels, along with channels suggested by YouTube’s related video service. A process is observable whereby users accessing an ER YouTube video are likely to be recommended further ER content, leading to immersion in an ideological bubble in just a few short clicks. The evidence presented in this article supports a shift of the almost exclusive focus on users as content creators and protagonists in extremist cyberspaces to also consider online platform providers as important actors in these same spaces
    corecore